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Abstract: This paper presents and discusses the results of basic source finding tests in three dimensions 
(using spectroscopic data cubes) with Duchamp, the standard source finder for the Australian SKA 
Pathfinder. For this purpose, we generated different sets of unresolved and extended Hi model sources. 
These models were then fed into Duchamp, using a range of different parameters and methods provided 
by the software. The main aim of the tests was to study the performance of Duchamp on sources with 
different parameters and morphologies and assess the accuracy of Duchamp's source parametrisation. 
Overall, we find Duchamp to be a powerful source finder capable of reliably detecting sources down to 
low signal-to-noise ratios and accurately measuring their position and velocity. In the presence of noise 
in the data, Duchamp's measurements of basic source parameters, such as spectral line width and 
integrated fiux, are affected by systematic errors. These errors are a consequence of the effect of noise 
on the specific algorithms used by Duchamp for measuring source parameters in combination with the 
fact that the software only takes into account pixels above a given flux threshold and hence misses part 
of the flux. In scientiflc applications of Duchamp these systematic errors would have to be corrected 
for. Alternatively, Duchamp could be used as a source finder only, and source parametrisation could 
be done in a second step using more sophisticated parametrisation algorithms. 
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1 Introduction 

With the advent of the Square Kilometre Array (SKA; 
Dewdney et al. 2009) and its precursors and pathfind- 
ers, including the Australian SKA Pathfinder (ASKAP; 
DeBoer et al. 2009), the Karoo Array Telescope (Meer- 
KAT; Jonas 2009), and the Aperture Tile In Focus 
(APERTIF; Oosterloo et al. 2009), the prospect of 
deep radio continuum and Hi surveys of large areas 
on the sky demands for new strategies in the areas of 
data reduction and analysis, given the sheer volume of 
the expected data streams, in particular for spectro- 
scopic surveys. 

Of particular importance is the automatic and ac- 
curate identification and parametrisation of sources 
with high completeness and reliability. Due to the 
large data volumes to be searched, source finding algo- 
rithms must be fully automated, and the once common 
practice of source finding 'by eye' will no longer be fea- 
sible. Moreover, accurate source parametrisation algo- 
rithms need to be developed to generate reliable source 
catalogues free of systematic errors, as otherwise the 
integrity of scientific results based on the survey data 
could be compromised. 

In this paper we will take a closer look at the 
Duchamp source finder^ (Whiting 2011a, 2012). Du- 
champ has been developed by Matthew Whiting at 
CSIRO as a general-purpose source finder for three- 
dimensional data cubes as well as two-dimensional im- 



ages and will serve as the default source finder in the 
processing of data from the ASKAP survey science 
projects. The software identifies sources by searching 
for regions of emission above a specified fiux thresh- 
old. To improve its performance, Duchamp offers sev- 
eral methods of preconditioning and filtering of the 
input data, including spatial and spectral smoothing 
as well as reconstruction of the entire image or data 
cube with the help of wavelets. In addition to source 
finding, Duchamp provides the user with basic source 
parametrisation, including the measurement of posi- 
tion, size, radial velocity, line width, and integrated 
fiux of a source. More information about the capa- 
bilities of the software is available from the Duchamp 
User Guide (Whiting 2011b). A brief overview of Du- 
champ's basic functionality is provided in Section 3. 

So far, Duchamp's source finding and parametrisa- 
tion capabilities have never been systematically tested 
on a large set of artificial sources with well-defined pa- 
rameters. The aim of this paper is to bridge this gap 
by thoroughly testing the performance of Duchamp 
on sets of artificial point sources and galaxy models as 
well as a data set containing real galaxies and telescope 
noise. The tests were originally motivated by the need 
to identify suitable source finding algorithms for the 
Widefield ASKAP L-band Legacy All-sky Blind Sur- 
vey (WALLABY; Koribalski & Staveley-Smith 2009),^ 
one of the large, extragalactic ASKAP survey science 



-"^DuCHAMP website: http://www.atnf.csiro.au/peopIe/ 
Mattiiew. Whiting/Duchamp / 



^Principal investigators: Barbel Koribalski and Lister 
Staveley-Smith; public website: http://www.atnf.csiro.au/ 
research /WALLABY / 
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Table 1: Summary of the parameters used to gen- 
erate the visibihty data set and noise image for the 
point source models. 



Parameter (visibility) 


Value 


Unit 


Number of antennas 


36 




System temperature 


50 


K 


Declination 


-45° 




Total integration time 


8 


h 


Hour angle range 


±4 


h 


Cycle time 


5 


s 


Stokes parameters 


I 




Number of channels 


31 




Frequency 


1.42 


GHz 


Channel width 


18.31 


kHz 




3.86 


kms~^ 


Parameter (image) 


Value 


Unit 


Final image size 


31 X 31 


px 


Field diameter 


5 


arcmin 


Pixel size 


10 


arcsec 


Robustness 







Gaussian uv taper 


7.28 


kA 




1.54 


km 


RMS noise 


1.95 


mJy 


Synthesised beam 






major axis 


27.1 


arcsec 


minor axis 


26.7 


arcsec 


position angle 


87.°9 





projects currently in preparation (Wcstmeier & John- 
ston 2010). Hence, tlic tests presented liere will focus 
on the detection of compact and extended H i sources, 
in particular galaxies, in three-dimensional data cubes 
with ASKAP characteristics. 

However, we believe that the results and conclu- 
sions presented in this paper will be of interest not 
only to those involved in SKA precursor science, but 
to a larger community of astronomers interested in the 
automatic detection and parametrisation of sources in 
their data sets, regardless of the wavelength range in- 
volved. For a comparison of Duchamp's performance 
with that of other source finding algorithms we refer 
to the paper by Popping et al. (2011) in this issue. 

This paper is organised as follows: In Section 2 we 
summarise the source finding strategies of other largo 
Hi surveys in the past, followed by a brief overview of 
the DuCHAMP source finder in Section 3. In Section 4 
we present the outcome of our test of Duchamp on 
point source models with simple Gaussian line pro- 
files. Section 5 describes our testing of Dl'CIIAMP on 
models of disc galaxies with varying physical parame- 
ters. In Section 6 we apply Duchamp to a data cube 
containing real galaxies and genuine noise extracted 
from radio observations. A discussion of our results is 
presented in Section 7. Finally, Section 8 summarises 
our main results and conclusions. 



2 Source Finding in Previous 
Surveys 

Some of the previous, large H I surveys, including the 
H I Parkes All-Sky Survey (HIPASS; Barnes ot al. 2001), 
the Hi Jodrell All-Sky Survey (HIJASS; Lang ct al. 
2003), and the Arccibo Legacy Fast ALFA survey (AL- 
FALFA; GiovaneUi et al. 2005), already had to deal 
with the issue of (semi-)automatic source detection. 

In the case of HIPASS, two different source find- 
ers were used and the results combined to maximise 
completeness (Meyer et al. 2004). The first algorithm, 
MULTIFIND (Kilborn 2001), used a simple 4cr flux thresh- 
old combined with smoothing of the data cuIjc ou dif- 
ferent scales. The second algorithm, tophat, detected 
sources in the spectral domain by convolving each spec- 
trum in the data cube with a top-hat function of vary- 
ing widtli. Neitlier of the two algorithms alone man- 
aged to detect more than 90% of the final, combined 
source list. The two algorithms combined produced 
about 140,000 unique detections, all of which were in- 
spected by eye to remove potential false detections. 
The final HIPASS catalogue included 4315 sources (Meyer 
et al. 2004), resulting in an overall reliability of the au- 
tomatic source finding algorithms of only about 3%. 

In the case of HIJASS, again two different meth- 
ods were used and the results combined to improve 
completeness (Lang et al. 2003). The first method 
simply involved searching the cubes by eye to extract 
potential sources. The second algorithm, POLYFIND, 
first searched for signals above a given threshold in 
a smoothed version of the data cube and tlien ran a 
series of matched filters over the detected signals to 
decide whether a signal was likely genuine or false. As 
in the case of HIPASS, the source list produced by the 
automatic source finding routine was inspected by eye 
to further reject potential false detections. The posi- 
tions of uncertain detections were re-observed to either 
confirm of refute them. 

For the ALFALFA survey, a matched-filtering tech- 
nique was applied to the data in the spectral domain 
(Saintonge 2007). The data were convolved with a set 
of template functions created by combining the first 
two symmetrical Hermite functions, "i/oi^) and 4^2 (2:). 
The resulting templates range from simple Gaussian 
profiles for narrow signals to double-peaked profiles for 
broader signals, covering the range of spectral shapes 
expected from H I observations of real galaxies. In tests 
on 1500 simulated galaxies, 100% reliability and about 
70% completeness arc achieved at an integrated signal- 
to-noise ratio of S/N « 6, while the 90% completeness 
level is exceeded at S/N > 9.^ 

3 The Duchamp Source Finder 

Duchamp has been implemented as a general-purpose 
source finder for three-dimensional spectral-line data 
cubes with two spatial axes and one frequency (or 

^In her calculation of S/N, Saintonge (2007) makes the 
implicit assumption that the sources are spatially unre- 
solved. 
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velocity) axis, although the software can also oper- 
ate on one- and two-dimensional data sets (Whiting 
2011b). DuCHAMP finds sources by applying a sim- 
ple flux threshold to the data cube, specified by the 
threshold or snrCut keyword, and searching for pix- 
els above that threshold. In a second step, the soft- 
ware attempts to merge detections into sources under 
specific circumstances that can be controlled by the 
user. One option is to simply merge adjacent pix- 
els (f lagAdjacent keyword). Alternatively, a max- 
imum spatial and spectral separation can be speci- 
fied for the merging of detected pixels into sources 
(threshSpatial and threshVelocity keywords, respec- 
tively). Once detected, sources can be "grown" to 
a flux level below the actual detection threshold, us- 
ing the flagGrowth and growthThreshold/growthCut 
keywords. 

Basic removal of false detections is achieved by re- 
quiring that sources comprise a minimum number of 
contiguous spatial pixels and spectral channels, us- 
ing the minPix and minChaimels keywords, respec- 
tively. To improve the reliability of the source find- 
ing even further, Duchamp offers a powerful method 
of reconstructing the entire input data cube with the 
help of wavelets. Source finding is then performed 
on the reconstructed cube instead of the original in- 
put cube. Reconstruction can either be carried out 
in all three dimensions of the cube, or in the spa- 
tial (two-dimensional) or spectral (one-dimensional) 
domain only. 

Duchamp uses the so-called 'a trous' wavelet re- 
construction method (Starck & Murtagh 2002). First, 
the input data set is convolved with a specific wavelet 
filter function (three different functions are offered to 
the user by Duchamp). The difference between the 
convolved data set and the original data set is then 
added to the reconstructed cube. Next, the scale of 
the filter function is doubled and the procedure re- 
peated, using the convolved array as the new input 
data set. Once the user-specified maximum filter scale 
is reached, the final convolved data set is added to the 
reconstructed cube, and source finding on the recon- 
structed data set commences. 

The 'a trous' wavelet reconstruction of the data 
cube offers a powerful method of enhancing Duchamp 's 
source finding capabilities. First of all, the user can 
select the minimum (scaleMin keyword) and maxi- 
mum (scaleMax keyword) filter scales to be used in 
the reconstruction, providing efficient suppression of 
small-scale and large-scale artefacts in the data, such 
as noise peaks, baseline ripples, or radio-frequency in- 
terference. Furthermore, the user can specify an ad- 
ditional threshold (snrRecon keyword) to be applied 
when adding wavelet components to the reconstructed 
data cube, thereby reducing even further the number 
of spurious signals in the data cube. 

In comparison to simple data thresholding, the 'a 
trous' wavelet reconstruction method will greatly in- 
crease the completeness and reliability of Duchamp 's 
source finding procedure, and hence the method has 
been applied in all source finding tests presented in 
this paper. 
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Figure 1: Outline of the procedure used to cre- 
ate the model data cubes of point sources with 

MiRIAD. 

4 Point Sources with Gaussian 
Spectral Profiles 

For our first test of Duchamp we generated models 
of 1024 point sources with simple Gaussian spectral 
line profiles. This will allow us to assess the funda- 
mental performance of Duchamp under ideal condi- 
tions and to investigate the accuracy of the software's 
source parametrisation algorithms. Point sources with 
Gaussian profiles are ideal for this test because — as a 
consequence of their simple morphology — their physi- 
cal parameters can be exactly defined and calculated 
to serve as a benchmark for Duchamp 's parametrisa- 
tion. 

In order to create the model data set, the Miriad 
(Sault, Teuben, & Wright 1995) task uvgen was em- 
ployed to generate visibility data of Gaussian noise at a 
frequency of 1.4 GHz with ASKAP characteristics and 
parameters similar to those anticipated for the WAL- 
LABY survey. The model parameters are summarised 
in Table 1. 

The visibility data were Fourier-transformed using 
Miriad's task invert to generate a noise image of 
600 X 600 pixels and 31 spectral channels with char- 
acteristics similar to WALLABY (again, see Table 1 
for details). The RMS noise level in this image is cr = 
1.95 mjy which is only slightly higher than the 1.6 mjy 
expected for WALLABY. 

In order to generate images of point sources, the 
Miriad task imgen was used to create 1024 data cubes 
each of which has a size of 31 x 31 pixels and 31 spectral 
channels and contains a single point source with Gaus- 
sian spectral line profile in the centre. Each source was 
randomly assigned a peak flux in the range of 1 to 20cr, 
resulting in an average of about 54 sources per la in- 
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Figure 2: Example of a point source model generated for testing DuCHAMP. The left-hand panel shows 

a single-channel map of the data cube (at the systemic velocity of the source), and the right-hand panel 
depicts the spectrum at the source position. The circle in the map illustrates the half-power beam width. 



tcrval. Spectral line widths (FWHM) range from 0.1 
to 10 spectral channels, equivalent to approximately 
0.4 to 38.6 kms~^, resulting in a density of about 
27 sources per 1 kms~^ line width interval. While 
in reality sources with Hi line widths of as small as 
0.4 kms~^ will not exist, the reason for including such 
narrow lines in our test is to study the performance of 
DuCHAMP on sources that are spectrally unresolved, 
irrespective of the absolute line width. 

Each of the 1024 cubes was convolved with the 
beam model produced by invert. Next, a random por- 
tion of 31 X 31 pixels of the original noise cube was se- 
lected and added to each convolved image to create the 
final images used for testing Duchamp. To facilitate 
correct integrated flux measurements, we added infor- 
mation on the synthesised beam to the image header. 
The entire procedure is outlined in Figure 1. An ex- 
ample image and spectrum of one of the point source 
models is shown in Figure 2. 

4.1 Running Duchamp 

Next, we ran Duciiamp (version 1.1.8) on the data 
cubes. In order to find out which combination of con- 
trol parameters provided the best performance in terms 
of completeness and reliability, we first ran Duchamp 
several times with different flux thresholds and min- 
imum wavelet scales to test the performance of each 
set of parameters. An overview of the different com- 
pleteness and reliability levels achieved in these runs 
as a function of integrated flux of the source is shown 
in Figure 3. 

We then selected the best set of control parameters 
for the analysis presented in this section. In this best- 
performing run (number 5 in Figure 3 and Table 2) we 
used a 1.5a flux threshold equivalent to 2.9 mjy. In 
addition, we made use of Duchamp's 'a trous' wavelet 
reconstruction. We employed a full three-dimensional 
wavelet reconstruction with a minimum wavelet scale 
of 2 (i.e. the smallest scales were excluded to suppress 
noise in the reconstructed cube) and a flux threshold 



of 3cr for wavelet components to be included in the re- 
constructed cube. In addition, we required sources to 
cover a minimum of 5 contiguous pixels in the image 
domain and 3 contiguous spectral channels above the 
detection threshold to be included in the final source 
catalogue. This will further reduce the number of spu- 
rious detections. The Duchamp input parameters ex- 
plicitly set in the parameter file are listed in Table 3. 

The 1024 output parameter files generated by Du- 
champ were concatenated, and those source entries 
whose positions were within ±1 pixel of the nominal 
source position were considered as genuine detections 
and selected for further processing and analysis. The 
results of this analysis will be presented and discussed 
in the following sections. 

For a number of reasons it is not possible to spec- 
ify the typical time it takes for DuCHAMP to process 
a certain amount of data. Firstly, the performance of 
Duchamp strongly depends on the exact choice of in- 
put parameters, including detection threshold, wavelet 
reconstruction choices, or settings related to merging 
and discarding of initial detections. Three-dimensional 
wavelet reconstruction of the input data cube, for ex- 
ample, is particularly computationally expensive. Sec- 
ondly, the running time of Ducjiiamp on a particular 
data cube will depend on a large number of details, 
including the number of sources in the cube, their size 
and morphology, and in principle even the number den- 
sity of sources in the cube. Thirdly, recent updates of 
the software have resulted in a significant improvement 
of Duchamp's performance, in particular compared to 
version 1.1.8 used for part of the testing presented in 
this paper. 

To get a basic idea of the impact of the aforemen- 
tioned parameters on the running time of Duchamp 
we performed a few simple tests on a standard laptop 
computer with a state-of-the-art, dual-core 2.3 GHz 
CPU (only one core at a time was actually engaged) 
and 4 GB of physical memory. We ran Duchamp 
several times with different parameters on our arti- 
ficial noise data cube of 600 x 600 spatial pixels and 



www.pu blisb . csiro. au /journals/pasa 



Completeness (Integrated Flux) 





- m ¥ 




If ' ■/ 
If-' 1 
1 ' / 
if' 1 

; if, / • / 

':77 ■ 1 
/ ■ / 


— Duchamp 1 Rel=63.0% 

— - Duchamp 2 Rel=97.0% 

Duchamp 3 Rel=99.1% 

— Duchamp 4 Rel=99.2% 

— - Duchamp 5 Rel=63.2% 

Duchamp 6 Rel=97.0% 

— Duchamp 7 Rel=98.6% 
-- Duchamp 8 Rel=6.3% 



0.0 0.2 0.4 0.6 0.8 1.0 

Integrated flux [Jy km/s] 



Figure 3: Completeness as a function of integrated 
flux for different tests of DuCHAMP with varying 
control parameters. The parameters employed in 
the different runs are listed in Table 2. The overall 
reliability for each run is listed in the legend. 



31 spectral channels without any sources in it. Using 
a 5(T detection threshold, Duchamp takes about 0.64 s 
of CPU time to run, producing no detections. When 
performing a one-dimensional wavelet reconstruction 
in the spectral dimension prior to source finding, the 
running time increases by a factor of 30 to about 19 s. 
Full three-dimensional wavelet reconstruction is slower 
by a factor of 120, requiring about 77 s to complete. 

As mentioned before, these numbers strongly de- 
pend on the number and nature of sources present in 
the cube. Decreasing the flux threshold to 3(t without 
wavelet reconstruction results in 966 detections (all 
of which are noise peaks) and increases the running 
time of Duchamp to about 3.5 s. Processing time will 
also increase with cube size. Doubling the cube size 
to 62 spectral channels increases the running time by 
a factor of 2 without wavelet reconstruction, but by 
factors of 1.9 and 2.6 in the case of one-dimensional 
and three-dimensional reconstruction, respectively, in- 
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dicating that an increase in cube size does not translate 
into a proportional increase in processing time when 
dealing with wavelet reconstruction. 

In summary, the time Duchamp needs to process 
a data cube is a complicated function of not only the 
machine specifications (e.g. CPU, memory, data trans- 
fer speed), but also the input parameters (e.g. flux 
threshold, wavelet reconstruction) and properties of 
the data set concerned (e.g. cube dimensions, number 
of sources). Hence, running times are almost impos- 
sible to predict and may have to be determined ex- 
perimentally on a case-by-case basis. Instead of ask- 
ing whether Duchamp is fast enough for a particular 
problem, the user would have to determine the opti- 
mal set of conditions that would allow processing of 
the data in a given period of time. An alternative op- 
tion would be to separate the problem into multiple, 
parallel processes. 

4.2 Results 

4.2.1 Completeness and Reliability 

Two of the most important parameters in the charac- 
terisation of source finder performance are complete- 
ness and reliability. Completeness is defined as the 
number of genuine detections divided by the true num- 
ber of sources present in the data. Completeness can 
either be calculated for the entire sample or more sen- 
sibly for a subset, e.g. for sources within a certain pa- 
rameter range. Reliability is defined as the number of 
genuine detections divided by the total number of de- 
tections produced by the source finder. Reliability can 
only be calculated for the entire sample of sources and 
not for a subset of sources within a certain parameter 
range, because false detections do not possess physical 
parameters as such. Alternatively, the parameters de- 
rived by the parametrisation algorithm of the source 
finder can be used to derive reliability as a function 
of different source parameters, but it is important to 
note that for genuine sources those parameters can be 
affected by systematic errors and do not necessarily 
correspond to the original source parameters. 

The ideal source finder would produce a complete- 
ness and reliability of 100%. In reality, however, we 



Table 2: Relevant input parameters for the differ- 
ent test runs of DuCHAMP in order to find the opti- 
mal set of control parameters (see Figure 3). The 
best-performing parameter set (run 5) was then 
used for the analysis presented in this paper. 
Run threshold scalcMin 



1 


1.5 


1 


2 


2.0 


1 


3 


2.5 


1 


4 


3.0 


1 


5 


1.5 


2 


6 


2.0 


2 


7 


1.0 


3 


8 


0.5 


3 



Table 3: Duchamp input parameters (Whiting 
2011b) explicitly set in the input parameter file 
for point source models. The default values of 
Duchamp were used for all other parameters. 



Parameter Value Comment 

threshold 0.0029265 1.5 x RMS 

minPix 5 

minChannels 3 

flag Adjacent true 

flagATrous true Wavelet reconstr. 

reconDim 3 in 3 dimensions 

snrRecon 3 

scaleMin 2 
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Figure 4: Top panel: Completeness of the point 
source models as a function of true peak signal- 
to-noise ratio in bins of la. Middle panel: Same, 
but as a function of true integrated flux in bins 
of 0.1 Jykms"^. Bottom panel: Same, but as a 
function of true line width (FWHM) in bins of 
2.5 kms"^ 



will have to find a compromise between good complete- 
ness and good reliability. In the case of Duchamp, for 
example, decreasing the flux threshold for detections 
will lead to an increase in completeness, but at the cost 
of lower reliability. 

In our test of Duchamp on the set of 1024 point 
source models the software finds a total of 1103 sources 
of which 850 are genuine detections. The remaining 
253 detections are false positives due to strong noise 
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Figure 5: Top panel: Reliability of the point source 
models as a function of measured peak signal-to- 
noise ratio in bins of Icr. Middle panel: Same, but 
as a function of measured integrated flux in bins 
of 0.05 Jykms"^. Bottom panel: Same, but as a 
function of measured line width (wso) in bins of 
2.5 kms"^ 



peaks in the data cube. These numbers translate into 
an overall completeness of 83.0% and an overall relia- 
bility of 77.1%."* 



^Note that these numbers differ slightly from the ones 
quoted for run 5 in Figure 3 because a different realisation 
of the model was used in the initial tests. Reliability val- 
ues will generally depend on the characteristics of the data 
cube under consideration (e.g. the size of the cube) and are 
therefore difficult to assess and compare. 
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Figure 6: Left-hand panel: Position error of the point source models in right ascension and dechnation. 
Right-hand panel: Mean position error (black data points) and corresponding standard deviation (error 
bars) as a function of true peak signal-to-noise ratio in la bins. 



Completeness as a function of peak signal-to-noise 
ratio is plotted in the top panel of Figure 4. The detec- 
tion list produced by Duchamp is complete down to 
a peak flux level of Fpeak w lOcr, but below that level 
completeness decreases to below 50% at about 3a. The 
completeness curve shows a much steeper rise when 
plotted against integrated flux instead of peak flux 
(middle panel of Figure 4). The 100% completeness 
level is reached at Jlnt « 0.3 Jykms"^, correspond- 
ing to an Hi mass of about 7 x 10* M© at a dis- 
tance of 1 Mpc, or 7 X 10* M© at 100 Mpc, for the 
expected 8-hour integration per pointing of the WAL- 
LABY project on ASKAP. Below that flux level there 
is a sharp drop in completeness. 

The bottom panel of Figure 4 shows completeness 
plotted as a function of true line width (FWHM), ir- 
respective of the peak flux and integrated flux of a 
source. Over most of the covered line width range 
the completeness remains constant at approximately 
90%, but it gradually decreases to about 50% below 
line widths of 10 kms~^. This decrease is presumably 
the result of the 'a trous' wavelet reconstruction of the 
data cube. By ignoring the smallest wavelet scales in 
the reconstruction we suppress the detection of noise 
peaks, but at the same time we arc also less sensitive 
to genuine sources with narrow spectral lines. 

Reliability as a function of measured peak signal- 
to-noisc ratio, measured integrated flux, and measured 
line width (wso) is plotted in the top, middle, and bot- 
tom panels of Figure 5. Duciiamp achieves 100% reli- 
ability at a peak signal-to-noise ration of about 5 and 
an integrated flux level of approximately 0. 1 Jy km . 
Reliabilities range between about 80% to 100% over 
most of the covered line width range, but drop signifi- 
cantly for sources with narrow lines of W50 ;S 15 kms~^ 
due to the increasing number of false detections asso- 
ciated with noise peaks. For line widths of less than 
about 5 kms~^ the reliability increases sharply, be- 
cause Duchamp effectively filters narrow signals caused 
by noise peaks through wavelet filtering and minimum 
channel requirements. 



However, as discussed earlier, reliability calcula- 
tions are very difiicult to assess and should be ap- 
proached with great caution. First of all, reliability can 
only be specified as a function of measured source pa- 
rameters because false detections do not have genuine 
physical parameters. Any errors in a source finder's 
parametrisation will therefore affect the calculated re- 
liability curves. Secondly, the actual reliability num- 
bers are entirely meaningless in the case of model data 
as they depend on how the sources were distributed 
across the model data cube. Increasing the volume 
of the cube (without increasing the number of sources 
therein) will result in lower reliabilities as the fraction 
of false detections increases. Consequently, reliabilities 
can only be compared on a relative scale, e.g. when 
testing different source finders on the same data set to 
determine which algorithm performs best. 

4.2.2 Source Position 

The resulting position errors are plotted in the left- 
hand panel of Figure 6. Duchamp does an excellent 
job in determining accurate source positions, with a 
mean position error of 0.0±1.6 arcsec in right ascension 
and 0.1 ± 1.5 arcsec in declination. 

The mean position error (in terms of angular sepa- 
ration firom the nominal source position) as a function 
of peak signal-to-noise ratio, in bins of la, is shown in 
the right-hand panel of Figure 6. For bright sources of 
-Fpoak ~ 20(7 the mean position error is approximately 
1 arcsec, increasing to about 5 arcsec for Fpcak ~ 3a. 
These numbers correspond to only about 4% and 19%, 
respectively, of the FWHM of the synthesised beam. 

Two limitations should be noted at this point. First 
of all, in our models the source was always placed ex- 
actly on the central pixel of the data cube. We did 
not explicitly test placement of sources at positions 
in between the grid points of the cube, which — in the 
case of point sources — could result in reduced detec- 
tion rates and less accurate source positions. Secondly, 
as with other source parameters, source positions will 
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Figure 7: Left-hand panel: Histogram of radial velocity errors (black curve) of the point source models 
in bins of 0.1 kms~^. The red, dashed curve is the result of a Gaussian fit to the histogram. Right-hand 
panel: Standard deviation of the velocity error of the sources as a function of true peak signal-to-noise 
ratio in bins of la. 



be inaccurate in cases where two or more sources are 
confused. 

4.2.3 Radial Velocity 

An overall histogram of radial velocity errors derived 
from the Duchamp run is shown in the left-hand panel 
of Figure 7. As expected, velocity errors have an ap- 
proximately Gaussian distribution centred on zero. The 
mean velocity error of all sources is 0.0 ± 1.7 kms~^. 
The red, dashed curve in Figure 7 shows the result of 
a Gaussian fit to the histogram. While the overall dis- 
tribution of velocity errors follows the fitted Gaussian 
function, there are a few significant deviations, namely 
a somewhat higher and sharper peak in the centre 
(which is slightly shifted into the negative range) and 
conspicuous 'wings' between 2 and 3 kms~^ (both pos- 
itive and negative) where source counts are systemat- 
ically too high with respect to the fit. The FWHM 
of the fitted Gaussian is 1.94 ± 0.04 kms~^, and the 
centroid is —0.026 ±0.017 kms~^ which deviates from 
zero by about 1.5a, refiecting the aforementioned neg- 
ative offset of the peak of the histogram. These devi- 
ations from a pure Gaussian distribution are possibly 
caused by digitisation effects related to the segmenta- 
tion of the frequency axis into discrete bins of 18.3 kHz 
equivalent to 3.86 kms~^. 

The standard deviation of the radial velocity error 
as a function of peak signal-to-noise ratio in la bins 
is shown in the right-hand panel of Figure 7. As ex- 
pected, the standard deviation from the mean (which 
is essentially zero for all peak fiux intervals) increases 
with decreasing peak fiux. While for bright sources of 
F'poak ~ 20(T the standard deviation is below 1 kms~^, 
it increases to almost 6 kms~^ for faint sources near 
the 3a level. 



4.2.4 Line Width 

Figure 8 shows the ratio of measured line width ver- 
sus true line width (FWHM of the original Gaussian 



model) as a function of peak signal-to-noise ratio in 
bins of la. Duchamp determines three different types 
of line width: ws,o is the full width at 50% of the peak 
fiux, 1020 is the full width at 20% of the peak fiux, 
and WvbI is the full detected line width of the source, 
i.e. the width across all channels with detected fiux. 
For a Gaussian line, w^o is equivalent to the FWHM, 
and the ratio of FWHM/?j;5o should therefore be 1. 
The relation between W2o and wso in the case of a 
Gaussian line is given by the constant factor of 



W20 



= 1.53. 



(1) 



Finally, the relation between Wvoi and wso, again as- 
suming a Gaussian line profile, is defined via 



"'so 



log J 



peak 



(2) 



where Fthi — nxa is the flux threshold used in the cal- 
culation of lUvoi. These theoretical relations are plot- 
ted in Figure 8 as the dashed lines for w^o (black), W20 
(red), and «;vci (blue; for Fthr = 1.5a). 

Duchamp 's measurement of W50 (black data points) 
is in excellent agreement with the expectation (black, 
dashed line) over a wide range of peak signal-to-noise 
ratios. Only for faint sources of -Fpoak < 5a are the 
measured line widths on average slightly smaller than 
the true widths, but by no more than about 10 to 15%. 

In contrast, Duchamp's measurements of 1020 and 
u)vci (red and blue data points, respectively) are sys- 
tematically too large over most of the covered range 
of signal-to-noise ratio as compared to the theoretical 
expectation (red and blue dashed lines, respectively). 
Only for faint sources of Fpoak ^ 5a do the measured 
W20 fall short of the theoretical ones. This result sug- 
gests that W50 is the most accurate measurement of 
line width provided by Duchamp and should be used 
instead of W20 and Wvci for the characterisation of as- 
tronomical sources. However, both W50 and 1020 sys- 
tematically fall short of the true line width for faint 
sources below -Fpeak ~ 5a. 
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Figure 8: Ratio of measured versus true line width 
for the point source models as a function of true 
peak signal-to- noise ratio in bins of Icr. The black 
data points show w^q, the red data points ^^20, and 
the blue data points Wvci (for a 1.5cr flux thresh- 
old) , all of which have been divided by the original 
FWHM of the Gaussian line. The corresponding 
theoretical expectations for a Gaussian line profile 
are shown as the black, red, and blue dashed lines. 



4.2.5 Peak Flux 

The ratio of recovered versus true peak flux of the 
model point sources is plotted in the left-hand panel of 
Figure 9 as a function of peak signal-to-noise ratio in 
la bins. The dashed and dotted red lines indicate the 
theoretical ±la and ±2cr envelopes, respectively. The 
right-hand panel shows the same figure, but as a func- 
tion of integrated flux in bins of 0.1 Jykms"'^. For 
bright sources of -Fpcak -Sj Wa Duchamp accurately 
recovers the peak flux of the sources, although there 
is the general tendency of measured peak fluxes be- 
ing shghtly too high on average. For fainter sources of 
-Fpeak ^ 5a there is a strong deviation, with measured 
fluxes being systematically too high by a significant 
factor. This is generally due to faint sources being 
more likely to be detected when their maximum coin- 
cides with a positive noise peak, whereas faint sources 
sitting on top of a negative noise peak will likely re- 
main undetected, thereby creating a strong bias in the 
measurement of peak fluxes. 

Even for high peak signal-to-noise ratios the peak 
fluxes measured by Duchamp tend to be slightly too 
large. Duchamp determines the peak flux of a source 
by simply selecting the data element with the highest 
flux encountered. As mentioned before, this method is 
biased towards selecting data elements that have been 
affected by positive noise peaks. In sources with broad 
spectral signals there is a higher probability of finding 
a positive noise signal in one of the channels near the 
peak of the line that increases the signal beyond the ac- 
tual line peak. This is a result of the source being well 
resolved in the spectral domain. Hence, peak fiuxes 
measured by Duchamp will generally be too high ir- 
respective of source brightness as long as the source 
is spectrally (or spatially) resolved. For very bright 



sources, however, the relative error will be negligible. 
4.2.6 Integrated Flux 

The ratio of measured versus true integrated flux of 
the model point sources as a function of peak signal- 
to-noise ratio in bins of la is presented in the left- 
hand panel of Figure 10. The right-hand panel shows 
the same figure, but as a function of integrated flux in 
bins of 0.1 Jykms^^. Apparently, Duchamp's mea- 
surement of the integrated flux of a source is system- 
atically too low by a signiflcant factor. Even for the 
brightest sources of Fpcak ~ SOcr only about 90% of 
the true flux is recovered by Duchamp, and that figure 
drops to well below 50% for faint sources of -Fpcak < 5a. 

This issue is likely caused by the fact that Duchamp 
only considers pixels above the detection threshold when 
calculating the integrated flux. Pixels below the thresh- 
old, while potentially contributing significantly to the 
overall flux of a source, are not included in the summa- 
tion carried out by Duchamp, resulting in integrated 
fluxes being systematically too small. 

In order to study the expected decrease in the inte- 
grated flux measurement, let us assume a point source 
with Gaussian line profile being observed with a tele- 
scope with radially symmetric Gaussian point spread 
function (PSF), 



F{x,y,v) = Fpoakcxp 



2 , 2 

^ +y 

ZCTpgp 



V 

2^ 



(3) 



with amplitude, -Fpeak, velocity dispersion, cr„, and 
PSF size, cTpsp. The integrated flux measurement can 
then be considered as the integral under the three- 
dimensional Gaussian brightness proflle across the fre- 
quency/velocity range, iuo, and the spatial range, ±a;o 
and ±2/0, over which the flux of the line is above the 
detection threshold, thus 



^0 yo "0 



F{x, y, v) dx dydv 



peak 



(27t)^''^crpspo-i, erf 



Xo 



erf(- 



J^L-)erff. 



Vv^crpsp 



V^ay 



(4) 



(5) 



where erf (x) is the error function. Inserting the appro- 
priate integration limits and then dividing Equation 5 
by the total flux (i.e. integrated over ±00) leads to a 
theoretical integrated flux ratio of 



Pint r 



= erf K/-ln(Fthr/Fpeak) 



(6) 



with a flux threshold of _Fthr = n x a. 

The resulting theoretical integrated flux according 
to Equation 6, assuming a 1.5cr threshold, is shown 
as the solid red curve in Figure 10. The integrated 
fluxes measured by Duchamp are only slightly below 
what one would expect from a simple integration over 
a three-dimensional Gaussian. A fit to the data points 
instead yields an effective fiux threshold of 2.2cr, shown 
as the dotted red curve in Figure 10, which is slightly 
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Figure 9: Left-hand panel: Ratio of measured versus true peak flux (black data points) and corresponding 
standard deviation (error bars) of the model point sources as a function of true peak signal-to-noise ratio 
in la bins. The dashed and dotted red hues indicate the theoretical ilcr and ±2cr envelopes, respectively. 
Right-hand panel: Same, but as a function of true integrated flux in bins of 0.1 Jykms"^. 
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Figure 10: Left-hand panel: Ratio of measured versus true integrated flux (black data points) and corre- 
sponding standard deviation (error bars) for the model point sources as a function of true peak signal-to- 
noise ratio in bins of la. The solid red curve shows the theoretical expectation for the 1.5a flux threshold 
used in our test. The dotted red curve shows the best fit to the data points, corresponding to an effec- 
tive flux threshold of 2.2a. Right-hand panel: Same, but as a function of true integrated flux in bins of 
0.1 Jykms"\ 



larger than the 1.5a used when running Duchamp. It 
is not quite clear why Duchamp performs worse than 
expected. The discrepancy could be due to the fact 
that the software sums over discrete pixels whereas we 
assumed continuous integration in our mathematical 
model. This will likely result in small differences, par- 
ticularly in those cases where the number of elements 
across the Gaussian profile is small. In our case, as we 
are dealing with point sources, this is certainly true for 
the spatial dimension. 



In summary, integrated flux measurements pro- 
vided by Duchamp are systematically too small and 
will need to be corrected substantially to compensate 
for the systematic offset. 



5 Models of Disc Galaxies 

In order to test the performance of Duchamp on more 
realistic, extended sources, we generated 1024 artificial 
H I models of galaxies with a wide range of parameters, 
using a programme written in C for direct manipula- 
tion of FITS data cubes. All galaxies were modelled 
as infinitely thin discs with varying inclination (0° to 
89°), position angle (0° to 180°), and rotation veloc- 
ity (20 to 300 kms"^). For any one galaxy, inclination 
and position angle were considered to be constant over 
the radial extent of the disc, while the rotation velocity 
increases linearly from to «rot between the centre and 
0.5 times the semi-major axis of the disc and remains 
constant beyond that radius. Individual spectral pro- 
files across the disc were assumed to be Gaussian with 
a dispersion of 9.65 kms^^ (equivalent to 2.5 times the 
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Figure 11: Example of a model galaxy generated for testing DuCHAMP. The left-hand panel shows the 
zeroth moment of the model, the middle panel shows the position-velocity diagram along the dashed, red 
line, and the right-hand panel depicts the integrated spectrum of the model galaxy. 



width of a spectral channel) . The radial surface bright- 
ness profile was assumed to be Gaussian, too, resulting 
in an elliptical Gaussian brightness distribution on the 
sky. 

Next, we again generated an artificial ASKAP visi- 
bility data set of pure Gaussian noise at a frequency of 
1.4 GHz with characteristics similar to the WALLABY 
survey, using the Miriad task uvgen with parameters 
as listed in Table 4. The visibility data were Fourier- 
transformed to create an image of the point spread 
function and a noise data cube with 1601 x 1601 spa- 
tial pixels of 10 arcsec size and 201 spectral channels. 
We then convolved the model galaxies with a clean 
beam derived from fitting a Gaussian to the central 
peak of the point spread function. Finally, the con- 
volved galaxy models were placed on a regular grid of 
32 X 32 galaxies and added to the noise cube to create 
the final data cube of model galaxies for the testing of 
DuCHAMP. 

The moment-zero map, position-velocity map, and 
integrated spectrum of one of the model galaxies is 
shown in Figure 11 for illustration. As with the point 
sources, all galaxies were centred on a pixel, although 
for extended sources we do not expect any significant 
effect from shifting the source centre with respect to 
the pixel centre. Again, all sources are isolated, and 
we did not attempt to test Duchamp in a situation 
where source crowding occurs. 

It is important to note at this point that the re- 
sulting model galaxies, while exhibiting some of the 
spatial and spectral characteristics of real spiral galax- 
ies, have been simplified to a great extent, resulting in 
limitations that need to be kept in mind when inter- 
preting the results presented in this section. Firstly, 
the assumption of an infinitely thin disc will result in 
unrealistic edge-on galaxies, with integrated fluxes as 
well as individual spectral line widths across the disc 
being too small. Secondly, parameters such as peak 
flux, angular size, or rotation velocity were all varied 
independently of each other, resulting in unrealistic 
combinations of galaxy parameters in some cases. The 
purpose of the models is to cover a vast parameter 



range of extended sources irrespective of whether that 
entire range is populated by real galaxies. Even if disc 
galaxies with a certain combination of parameters do 
not exist, other objects, such as irregular galaxies or 
high-velocity clouds, could still cover those regions of 
parameter space, and their exploration will therefore 
be meaningful. 



Table 4: Summary of the parameters used to gen- 
erate the visibility data set and noise image for the 
galaxy models. 



Parameter (visibility) 


Value 


Unit 


Number of antennas 


36 




System temperature 


50 


K 


Declination 


-45° 




Total integration time 


8 


h 


Hour angle range 


±4 


h 


Cycle time 


36 


s 


Stokes parameters 


I 




Number of channels 


201 




Frequency 


1.42 


GHz 


Channel width 


18.31 


kHz 




3.86 


kms~^ 


Parameter (image) 


Value 


Unit 


Final image size 


1601 X 1601 


px 


Field diameter 


4.° 45 




Pixel size 


10 


arcsec 


Robustness 







Gaussian uv taper 


7.28 


kA 




1.54 


km 


RMS noise 


1.86 


mjy 


Synthesised beam 






major axis 


30.9 


arcsec 


minor axis 


30.5 


arcsec 


position angle 


50.° 8 
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Table 5: DuCHAMP input par;unet(TS explicitly s(;t in the input paranic^ter lik^ for the galaxy models. The 
default values of Di'ciiamp wore used lor all otli(>r param(-U>rs. 



Parameter 


Run 1 


Run 2 


Run 3 


Comment 


threshold 


0.00186 


0.00186 


0.00186 


1.0 X RMS 


minPix 


10 


10 


10 




minChannels 


5 


5 


3 




flag Adjacent 


true 


true 


true 




flagGrowth 


false 


true 


true 




growthThreshold 




0.00093 


0.00093 


0.5 X RMS 


flagRejectBeforeMerge 


false 


true 


true 




flagATrous 


true 


true 


true 


Wavelet reconstruction 


reconDim 


3 


3 


3 


in 3 dimensions 


snrRecon 


2 


2 


2 




scaleMin 


3 


3 


3 





5.1 Running Duchamp 

We ran Duchamp (version 1.1.12) on the model galaxy 
cube several times with slightly different input param- 
eters to compare the performance. The different input 
parameters explicitly set in the parameter file are listed 
and compared in Table 5. In all cases we employed a 
1(7 flux threshold, equivalent to about 1.9 mjy, and 
performed a three-dimensional 'a trous' wavelet recon- 
struction with a minimum scale of 3 and a flux thresh- 
old of 2(7 for wavelet components to be included in the 
reconstructed cube. The slightly larger minimum scale 
as compared to the point source models is motivated 
by the fact that we are now dealing with spatially and 
spectrally much more extended sources. In addition, 
we varied the number of contiguous spectral channels 
required for detections and used Duchamp's growth 
criterion in a few runs with a growth flux threshold of 
0.5(7. The latter method will grow detections to flux 
levels below the original detection threshold, resulting 
in more accurate source parametrisation. As it turned 
out, the change from 5 to 3 consecutive spectral chan- 
nels for detections (run 2 versus 3) did not have any 
major impact on the results. Hence, only the results 
of runs 1 and 3 will be presented and discussed here. 

In order to compare the outcome of DuCHAMP with 
the original input catalogue, we wrote a short Python 
script that reads in and processes the different cata- 
logues. The script flrst reads in the Duchamp output 
catalogue, the original model catalogue, and a spe- 
cial mask data cube marking pixels with emission in 
the original model by assigning them a unique number 
characteristic to each input source. The script then 
cycles through all the detections made by Duchamp 
and decides for each detection whether it is genuine or 
not by checking the value of the mask data cube at the 
same position. If the detection is found to be genuine, 
the script will cycle through the original model cat- 
alogue to extract the actual input parameters of the 
respective source for comparison with the parametri- 
sation results of Duchamp. 

At the end of this process we get a match of de- 
tected sources with original input sources, allowing us 
to calculate parameters such as completeness, reliabil- 



ity, and the fraction of sources being broken up into 
multiple detections by Duchamp. In addition, we are 
able to compare the original parameters of each source 
with those determined by Duchamp to test the per- 
formance of Duchamp's parametrisation algorithms. 

5.2 Results 

5.2.1 Completeness and Reliability 

For run 1 (without growing of detections to flux levels 
below the threshold) , 436 out of 1063 detected sources 
are genuine, resulting in an overall reliability of 41%. 
As many original sources got broken up into multiple 
detections, only 194 of the 1024 input galaxies were de- 
tected, yielding an overall completeness of only 19%. 
There is a significant improvement for run 3 (with 
growing of detections to a flux level of 0.5a), where 
542 out of 1051 detected sources are genuine (relia- 
bility of 52%), but this time 521 of the 1024 input 
galaxies wore detected, resulting in a much improved 
overall completeness of 51%. 

Completeness as a function of different galaxy pa- 
rameters is shown in Figure 12 for runs 1 and 3 (black 
and red data points, respectively) . As mentioned be- 
fore, run 1 resulted in very low completeness values of 
typically only about 20% and no strong variation with 
either the integrated flux of a source or its inclination 
and rotation velocity. By growing detections to a flux 
level of 0.5(7 (run 3) we achieved much higher complete- 
ness levels over a large parameter range. 100% com- 
pleteness is achieved for sources of -Flnt > 20 Jy km s^^ , 
and completeness levels reach 50% at Fint ~ 2.5 ,Jy kms 
The latter corresponds to an Hi mass sensitivity of 
6 X lO'^ M0 at a distance of 1 Mpc, or 6 x 10^ M© 
at 100 Mpc, for the expected 8-hour integration per 
pointing of the WALLABY project on ASKAP. 

As shown in the middle and bottom panels of Fig- 
ure 12, there is a strong variation of completeness with 
both inclination and rotation velocity of the galaxies. 
While face-on galax:ies are on average detected at com- 
pleteness levels near 80%, Duchamp struggles to find 
edge-on galaxies, yielding average completeness levels 
of only about 20% for galaxies with inclination angles 
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Figure 12: Completeness for the galaxy models 
as a function of true integrated flux in bins of 
2.5 Jykms^^ (top panel), galaxy inclination in 
bins of 5° (middle panel), and rotation velocity 
in bins of 19.3 kms~^ (bottom panel) for runs 1 
(black) and 3 (red). 
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Figure 13: Fraction of model galaxies being bro- 
ken up into two or more separate detections by 
DuCHAMP as a function of true integrated flux in 
bins of 2.5 Jykms"^ (top panel), galaxy inclina- 
tion in bins of 5° (middle panel), and rotation ve- 
locity in bins of 19.3 kms""'^ (bottom panel) for 
runs 1 (black) and 3 (red). 



greater than 80°. This effect is caused by the com- 
bination of two separate effects. Firstly, as a result 
of the limitations from our assumption of an infinitely 
thin disc, edge-on galaxies have typically lower inte- 
grated fluxes than face-on galaxies. Secondly, edge- 
on galaxies typically have a broader spectral signature 
as a result of their higher projected rotation velocity, 
making it more difficult for Duchamp to pick up their 
extended signal. 

The latter effect can also be seen in the bottom 



panel of Figure 12, where completeness levels system- 
atically decrease as a function of increasing rotation 
velocity of a galaxy, irrespective of its inclination or 
integrated ffux, confirming that on average Duchamp 
is more likely to pick up face-on galaxies with narrow 
spectral lines. It is important to note, however, that at 
a given distance galaxies with higher rotation velocity 
will typically have a larger H I mass and are therefore 
more likely to be detected than gala^cies with lower 



14 



Publications of the Astronomical Society of Australia 



40 - 




Right ascension error (arcsec) Integrated flux (Jy km/s) 

Figure 14: Left-hand panel: Position errors (from run 3) for the model galaxies in right ascension and 
declination. Right-hand panel: Mean absolute position error (data points) and standard deviation (error 
bars) as a function of true integrated flux in bins of 2.5 Jykms"^. 



rotation velocity at the same distance. 

5.2.2 Brecik-up of Sources into Multiple Com- 
ponents 

Due to their rotation velocity, spiral galaxies often ex- 
hibit a large radial velocity gradient across their pro- 
jected disc on the sky, resulting in the characteristic 
double-horn profile of their integrated spectrum. This, 
however, can result in the two halves of a galaxy being 
detected as two separate sources by Duciiamp, in par- 
ticular in the case of faint, edge-on galaxies with large 
rotation velocities. 

In Figure 13 we have plotted the fraction of de- 
tected model galaxies that were broken up into two or 
more separate detections by Duchamp as a function 
of true integrated flux (top panel), inclination (mid- 
dle panel), and rotation velocity (bottom panel). For 
run 1 (black data points) there is a very high frac- 
tion of multiple detections, typically about 60-80%, 
with no strong variation with either integrated flux of 
the galaxy or inclination and rotation velocity of the 
disc. In total, 1,36 out of the 194 detected galaxies, or 
70.1%, were broken up into multiple components by 
Duchamp. 

Growing detections to the 0.5a level (run 3, red 
data points) results in a major improvement, with the 
number of multiple detections (in total 21 out of 521 de- 
tected galaxies, or 4.0%) dropping to zero over most 
of the covered parameter range. Only for faint sources 
of -Fint < 5 Jykms"'^ does the fraction of multiple de- 
tections gradually increase up to about 10% at the low 
end of the flux spectrum. Figure 13 also clearly shows 
the expected increase in multiple detections for galax- 
ies of higher inclination {i > 40°) and rotation velocity 
{viot 150 kms^"*^), which is the result of the double- 
horn profile becoming wider and more pronounced as 
the radial velocity gradient in the plane of the sky in- 
creases. 

A similar case, although more difficult to assess, is 
the detection of only one half of a galaxy (one horn 



of the double- horn profile), whereas the other half re- 
mains undetected. As there is only a single detection of 
each affected galaxy, such partial detections are much 
more difficult to identify. They should, however, result 
in a significant offset of both the measured position and 
radial velocity of the detected source with respect to 
the location of the originating galaxy. 

In the case of run 3, 62 out of 500 single detections 
show velocity errors of more than 20 kms~^, with 28 
even exceeding 150 kms^^. The former corresponds to 
a fraction of 12.4% of all single detections. Similarly, 
62 out of 500 singly detected sources have a position 
error of more than 20 arcsec, which again corresponds 
to a fraction of 12.4%. 

These results suggest that, even when growing de- 
tections down to the 0.5o" level, there is a significant 
number of partial (approximately 66 sources) or multi- 
ple (21 sources) detections, corresponding to an overall 
fraction of about 16.7% of all genuine detections. Such 
cases need to be identified in the output catalogue pro- 
duced by Duchamp, as otherwise they will introduce 
a significant bias in the measurement of source param- 
eters such as line width and H I mass. Identification of 
broken-up sources will be a very difficult task in prac- 
tice, as it may be impossible to decide whether two 
detections are part of the same source or two sepa- 
rate sources in close proximity. While the growing of 
detections to lower flux levels can in principle reduce 
the fraction of sources being broken up, an undesirable 
side effect will be the potential merging of neighbour- 
ing sources, e.g. close galaxy pairs in group or cluster 
environments. 

5.2.3 Source Position 

The left-hand panel of Figure 14 shows a scatter plot 
of position errors for the model galaxies (based on 

^There is no exact match between the 62 sources with 
large position error and the 62 sources with large velocity 
error. A total of 66 sources fulfil either of the two criteria. 
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Figure 15: Histogram of radial velocity errors 
(from run 3) for the model galaxies in bins of 
0.5 kms"^ 



run 3) in right ascension and declination. The mean 
position errors in right ascension and declination are 
1.7 ± 14.7 arcsec and 0.6 ± 12.7 axcsec, respectively. 
The standard deviation is fairly large because there 
are several sources with position errors of tens of arc- 
sec, well beyond the central concentration in the plot. 
These arc cases in which only one half of a galaxy was 
detected as a source, whereas the other half remained 
undetected, resulting in systematic offsets in position 
as well as velocity with respect to the original model. 

When excluding such cases of partial detections by 
only considering detections with position errors of less 
than 15 arcsec in both right ascension and declination, 
we obtain corrected errors of 0.9 ± 3.6 arcsec in right 
ascension and 0.5 ± 3.6 arcsec in declination. 

The combined, absolute position error as a function 
of true integrated flux is shown in the right-hand panel 
of Figure 14. For bright sources of Fint ^ 10 Jykms"^ 
source positions are very accurate with typical errors 
of about 2.5 arcsec. Towards the faint end of the dia- 
gram both mean error and standard deviation increase 
substantially, partly as a result of increasing statistical 
uncertainties, but also due to an increasing fraction of 
galax:ies that are only partially detected. 

5.2.4 Radial Velocity 

The mean velocity error (based on run 3) for the galax:y 
models is —1.9 ± 54.5 kms~^. As in the case of source 
position, the large standard deviation about the mean 
is caused by galaxies that are only partially detected. 
By including only sources with position errors of less 
than 15 arcsec in both right ascension and declination 
and velocity errors of less than 20 kms"'^ we can ex- 
clude such partial detections, resulting in a corrected 
mean radial velocity error of —0.8 ± 4.6 kms~^. 

A histogram of radial velocity errors for the galaxy 
models is shown in Figure 15. As in the case of point 
sources, the distribution is not exactly Gaussian. In- 
stead, there is a sharp peak; near zero and an under- 
lying broad distribution of errors, in particular in the 
negative range. Some of these non-Gaussian structures 



could again be the result of digitisation effects in con- 
junction with the spectral channel width of 3.86 km s~ ^ , 
while we have no conclusive explanation for the notice- 
able asymmetry of the distribution. 

5.2.5 Line Width 

In order to estimate the original line width of the input 

models, we calculated a 'pseudo line width' which bal- 
ances the intrinsic width of an individual line profile 
with the overall, integrated lino width resulting from 
the rotation velocity of the galax;y, thus 



Wmod = Y [2 Vrot sin(i)]^ -I- wf^j (7) 

where Urot is the rotation velocity of the model galaxy, 
i is the inclination of the disc, and Wint = 22.7 kms~^ 
is the intrinsic FWHM of the Gaussian spectral line at 
each position across the galaxy. 

The left-hand panel of Figure 16 shows the mean 
ratio of the measured line width, tvr,o, over the cal- 
culated 'pseudo line width', function of 
true integrated flux in bins of 2.5 Jy kms^^ (based on 
run 3). DuCHAMP measures accurate line widths close 
to the true value over a wide range of fluxes. The small 
deviation from the value of 1 can be easily explained 
by the fact that Wmod is just an approximation to the 
FWHM of the line jsrofile. Only for fainter sources of 
-Pint 5 Jykms"^ does the line width ratio decrease 
and the standard deviation increase significantly, in- 
dicating larger errors in DuCHAMP's measurement of 
line width. 

In the right-hand panel of Figure 16 we have plot- 
ted the ratio of wtm / TOmod as a function of TOmod in 
bins of 50 kms~^. While line width measurements for 
sources with narrow lines of Wmod ;$ 250 kms~^ are on 
average accurate, there is a systematic discrepancy for 
sources with broader lines, the line widths measured by 
DuCHAMP being systematically too small. The large 
standard deviation suggests that this could have been 
caused by cases in which only one half of the galaxy 
was detected, whereas the other half remained unde- 
tected, resulting in a significantly lower value of the 
measured line width. Nevertheless, line width mea- 
surements for fully-dotoctod sources should be accu- 
rate even if their line widths are large. This problem 
again demonstrates the need to identify partially de- 
tected sources to avoid systematic errors that would 
affect the scientific interpretation of the data. 

5.2.6 Integrated Flux 

The ratio of measured versus true integrated flux of the 
model galaxies, based on run 3, is shown in Figure 17. 
Similar to our previous tests on point sources (see Fig- 
ure 10), the integrated flux measured by Duchamp is 
systematically too low. For bright sources of Jlnt ~ 
20 Jykms"^ a large fraction of approximately 95% of 
the flux is recovered, whereas this figure drops to bo- 
low 60% for fainter sources of Fint < 2 Jykms"^. At 
the same time, the scatter significantly increases, sug- 
gesting larger uncertainties (on a relative scale) in the 
flux measurement of faint sources. 
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Figure 16: Ratio of measured {w^o, from run 3) versus true line width for the model galaxies as a function 
of true integrated flux in bins of 2.5 Jykms"^ (left-hand panel) and true line width in bins of 50 kms~^ 
(right-hand panel). The error bars indicate the standard deviation about the mean. 
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Figure 17: Ratio of measured (from run 3) ver- 
sus true integrated flux of the model galaxies as 
a function of true integrated flux. The error bars 
indicate the standard deviation about the mean. 
The bin width decreases towards lower fluxes. 



As discussed previously, the reason for the failure 
of DuCHAMP to accurately determine the integrated 
flux of a source is that the software only sums over data 
elements that arc above the flux threshold and hence 
misses some of the flux. Even the growth of detections 
down to the 0.5cr level has not solved this fundamental 
problem, although the defect has become less severe 
than for the point source models without growing (see 
the right-hand panel of Figure 10 for comparison). 

6 Model Cube Based on Real 
Galaxies 

So far, we have tested Duchamp on artificial sources 
embedded in perfectly Gaussian noise. While this is 
useful to study the basic performance of the software, 
real observations will be more challenging for any source 



finder due to the more complex morphology of real 
sources and the presence of various artefacts in the 
data, e.g. terrestrial and solar interference, spectral 
baseline instabilities, or residual continuum emission. 

Unfortunately, it is not possible to simply test Du- 
champ on a real H I data cube, because we would not 
have a-priori knowledge of the sources in such a cube 

and would not be able to assess which of the detec- 
tions made by Duchamp arc genuine. A solution to 
this problem would be to inject copies of real galax- 
ies into a real data cube of "pure" noise, i.e. a data 
cube extracted from telescopic observations that does 
not contain any H I sources above the noise level. This 
method combines the advantages of artificial source 
models, where the source locations and parameters arc 
exactly known, with those of real observations with re- 
alistic sources and artefacts. 

For this purpose, we generated a data cube con- 
taining real noise extracted from an observation with 
the Westerbork Synthesis Radio Telescope (WSRT). 
We then added about 100 data cubes from the "West- 
erbork Observations of Neutral Hydrogen in Irregular 
and Spiral Galaxies" (WHISP) survey (Kamphuis, Si- 
jbring, & van Albada 1996; Swaters et al. 2002), each 
containing one or more galaxies. The selected WHISP 
data cube were artificially rcdshifted by scaling their 
size and fiux level to match sources in a redshift range 
of 0.02 < z < 0.04, centred on the median redshift of 
0.03 expected for the WALLABY project (Koribalski 
& Staveley-Smith 2009). The procedure for creating 
the test data cube is explained in more detail by Scrra, 
Jurek, & Floer (2011). 

The final test data cube has a size of 360 x 360 spa- 
tial pixels and 1464 spectral channels. The pixel size of 
10 arcscc (with a synthcsiscd beam width of 30 arcscc) 
and channel width of 18.3 kHz (equivalent to about 
4 kms~^) were chosen to reflect the expected speci- 
fications of WALLABY. Figure 18 shows an example 
image and spectra of two of the galaxies in the final 
cube. As the locations and properties of the injected 
galaxies are well-known, we can directly compare them 
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Figure 18: Left-hand panel: Momcnt-zcro map of a small region of the WSRT cube with injected WHISP 
galaxies, showing two galaxies labelled A and B. Right-hand panels: Integrated spectra of the two galaxies. 



to the output of DuCHAMP to assess performance in- 
dicators such as completeness and reliability. 

6.1 Running Duchamp 

We ran Duchamp multiple times on the WSRT data 
cube with WHISP galaxies to probe different input 
parameter settings of Duciiamp. A summary of the 
runs and parameters used is given in Table 6. We 
mainly covered a wide range of flux thresholds between 
1.0 and 3.5a and tested one-dimensional (spectral do- 
main only) versus three-dimensional (spatial and spec- 
tral domain) wavelet reconstruction of the cube. The 
output catalogue of each run was again cross-matched 
with the original source catalogue, using the Python 
script described in Section 5.1. 

In order to obtain the original source catalogue, we 
ran DuCHAMP once on the input model cube without 
noise, using a very low detection threshold of well be- 
low the final noise level and no wavelet reconstruction. 
This resulted in a list of 100 sources against which 
the output catalogue provided by Duciiamp can be 
judged. Since this method already introduces a strong 
bias in the catalogue of source parameters, we will only 
analyse the completeness and reliability of Duchamp, 
but we shall not attempt to assess the parametrisation 
performance of the software, because wc do not have 
an exact source catalogue against which we would be 
able to assess the source parameters as measured by 
Duchamp from the final test cube. 

6.2 Results 

Completeness and reliability of the different runs of 
Duchamp on the WSRT model cube with WHISP 
galaxies are listed in Table 6 and displayed in Fig- 
ure 19 as a function of detection threshold. Generally, 
between about 40% to 60% of all galaxies in the cube 



were found by Duciiamp, while the overall reliability 
varies strongly from about 10% to 100% depending on 
detection threshold and wavelet reconstruction param- 
eters. 

We achieve better results for one-dimensional wave- 
let reconstruction (black and blue data points in Fig- 
ure 19) which generally yields higher completeness and 
reliability than three-dimensional wavelet reconstruc- 
tion (red data points). This is presumably due to 
the small angular size of most galaxies in the model 
cube; there is not much to gain from performing a 
wavelet reconstruction in the spatial domain, whereas 
one-dimensional wavelet reconstruction in the frequency 
domain yields rrmch better results because most galax- 
ies are well-resolved and extended in frequency. 

In Figure 20 we plot completeness as a function of 
integrated flux for selected runs of Duchamp. Above a 
flux of Fint > 3 Jy kms~^ Duchamp consistently finds 
all sources irrespective of the input parameters chosen. 
At lower fluxes the different runs produce significantly 
different results, with the one-dimensional wavelet re- 
construction (black and blue data jjoints) generally 
performing better than the three-dimensional recon- 
struction (red data points), as noted before. 

The best-performing parameter set in terms of com- 
pleteness, run 7, produces a completeness of 50% at an 
integrated fiux of Fint ~ 0.7 Jykms^^, corresponding 
to an Hi mass of 1.7 x 10^ Mq at a distance of 1 Mpc, 
or 1.7 X lO'^ Mq at 100 Mpc. This is worse than what 
we achieved for the point sources with Gaussian line 
profiles in Section 4, but significantly better than the 
outcome for the model galaxies in Section 5. The rea- 
son for the better performance could be that the arti- 
ficially redshifted WHISP galaxies are generally much 
more compact than the model galaxies created for the 
tests in Section 5. As with any threshold-based source 
finder, compact sources are easier to detect than ex- 
tended sources, even with prior wavelet reconstruction 
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Table 6: Relevant Duchamp input parameters set in the different input parameter files for the model 
based on real WHISP galaxies and WSRT noise. The default values of Duchamp were used for most of the 
other parameters. The last two rows list the overall completeness and reliability achieved by Duchamp. 



Run 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


threshold 


1.0 


1.5 


2.0 


2.5 


3.0 


3.5 


2.5 


3.0 


3.5 


2.5 


3.0 


growthThreshold 




1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


minPix 


25 


25 


25 


25 


25 


25 


25 


25 


25 


25 


25 


minChannels 


5 


5 


5 


5 


5 


5 


5 


5 


5 


5 


5 


reconDim 


1 


1 


1 


1 


1 


1 


1 


1 


1 


3 


3 


snrRecon 


4.0 


4.0 


4.0 


4.0 


4.0 


4.0 


3.0 


3.0 


3.0 


4.0 


4.0 


scaleMin 


3 


3 


3 


3 


3 


3 


3 


3 


3 


3 


3 


Compl(;tc^n(;ss (%) 


44 


44 


44 


43 


40 


38 


59 


50 


41 


49 


41 


Rcliul)ility ('Xj 


55 


55 


G3 


73 


93 


100 


10 


30 


93 


12 


88 



or smoothing. As the spectral profiles of the WHISP 
galaxies are generally broad and complex, performance 
is worse than in the case of the point source models in 
Section 4 which had much simpler and narrower Gaus- 
sian lines. 

The overall reliability in the case of run 7 is very 
low with only 10%. This figure, however, is the raw 
reliability achieved by Duchamp and can be substan- 
tially improved by filtering sources based on their mear- 
sured parameters. False detections are usually the re- 
sult of noise peaks being picked up by the source finder. 
A large fraction of those false noise detections will be 
characterised by very low integrated fluxes and small 
line widths, and often a simple cut in flux-line width 
space will remove more than 95% of false detections 
while retaining more than 95% of genuine detections. 
This fact is illustrated and discussed in more detail in 
Section 7.1. 

In summary, when running DuCHAMP on a realis- 
tic data cube with real galaxies at a redshift of about 
0.03 and genuine noise extracted from observational 
data taken with the WSRT, the software performs as 
expected with completeness levels ranging in between 
those achieved for the compact and extended model 
sources discussed in the previous sections. This re- 
sult illustrates that the performance of Duchamp, as 
with any source finder based on flux thresholding, will 
strongly depend on the morphology and extent of the 
sources to be detected. Even with multi-scale wavelet 
reconstruction, DuCHAMP is more likely to uncover 
compact sources than sources that are significantly ex- 
tended, either spatially or spectrally. 

At the same time, the performance of Duchamp 
does not seem to be hampered by the fact that we are 
dealing with real telescope data and noise, as the com- 
pleteness and reliability levels reported in Table 6 are 
generally very similar to what we achieved with the 
model sources discussed in the previous sections. This 
is presumably due to the excellent quality of the West- 
erbork data which do not contain any obvious artefacts 
such as interference or residual continuum emission. 



7 Discussion 

In general, Duchamp does what it promises to do. It 
is able to reliably detect sources down to low signal- 
to-noise ratios and accurately determine their posi- 
tion and radial velocity. These are the most funda- 
mental requirements for any source finder. Our tests 
also demonstrated that by using and fine-tuning the 
options of 'a trous' wavelet reconstruction and grow- 
ing of sources to lower flux levels the performance of 
Duchamp can be greatly enhanced. 

7.1 Improving Reliability 

The reliability figures reported throughout this pa- 
per have all been "raw" reliabilities, i.e. reliabilities as 
achieved by Duchamp prior to any filtering of the out- 
put source catalogue. The user would normally wish 
to substantially improve these through appropriate fil- 
tering of the source catalogue based on the source pa- 
rameters as measured by Duchamp. 

The left-hand panel of Figure 21 shows the mea- 
sured integrated flux plotted against measured line 
width for all genuine (black data points) and false 
(red data points) detections found by Duchamp in 
the point source models discussed in Section 4. It 
is obvious that genuine and false detections occupy 
largely disjunct regions of Fint-wso parameter space, 
with false detections generally occurring near the low 
end of the integrated flux spectrum. Similar plots can 
be generated for other combinations of source param- 
eters, but Fint and wso usually provide the best dis- 
tinction between genuine and false detections. 

The easiest way to improve the reliability of Du- 
CHAMP's source finding results is to simply apply a cut 
in Fint to exclude most false detections while retaining 
most of the genuine sources. In our example, apply- 
ing a cut at Fint = 40 mjykms"'^ will discard 97.2% 
of all false detections while at the same time retain- 
ing 96.9% of all genuine sources, thereby increasing 
the overall reliability from 77.1%. to 99.2% while only 
moderately decreasing the overall completeness from 
83.0% to 80.6%. 

A similar cut can be applied to the results from 
run 7 on the test data cube containing artificially red- 
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Figure 19: Completeness (filled circles, solid lines) 
and reliability (open circles, dashed lines) of 
DuCHAMP on the WSRT model cube with WHISP 
galaxies for different flux thresholds and input pa- 
rameters. The colours, as shown in the legend, 
distinguish the different wavelet reconstruction 
modes (one-dimensional versus three-dimensional) 
and wavelet reconstruction thresholds (3cr versus 
4(t) used in the tests (see Table 6 for details) . The 
numbers alongside the data points refer to the cor- 
responding runs as listed in Table 6. 



shifted WHISP galaxies, as plotted in the right-hand 
panel of Figure 21. Again, applying a simple flux 
threshold of 0.5 Jykms^^ will improve reliability from 
10% to 84%, while only moderately decreasing com- 
pleteness from 59% to 50%. The method is not quite 
as successful as for the point sources, as we are now 
dealing with real galaxies and real noise with interfer- 
ence and artefacts, but nevertheless a significant im- 
provement in reliability can be achieved without any 
severe impact on the number of genuine detections. 

This simple example illustrates that the "raw" reli- 
ability figures quoted throughout this paper should not 
be considered as the final numbers. Reliability can be 
greatly improved through very basic filtering in param- 
eter space of the Duchamp output catalogue. In prin- 
ciple, this applies to the output of almost any source 
finder. Alternatively, instead of removing sources from 
the output catalogue, it may be desirable to calculate 
a reliability number for each catalogue entry based on 
the source's location in parameter space and leave it 
to the catalogue's users to decide as part of their sci- 
entific analysis at which reliability level they wish to 
make the cut. 

7.2 Source Parametrisation Issues 

When it comes to source parametrisation, the mea- 
surements provided by Duchamp are affected by sev- 
eral systematic errors. These systematic errors are not 
due to errors in the software itself, but a consequence 
of the the presence of noise in the data as well as the 
methods and algorithms used for measuring source pa- 
rameters. 



Figure 20: Completeness as a function of in- 
tegrated flux for selected runs (see legend) of 
Duchamp on the WSRT model cube with WHISP 
galaxies. The choice of colours is the same as in 
Figure 19. 



Spectral line widths determined by Duchamp are 
generally very accurate and not much affected by noise- 
induced, systematic errors as far as the W50 parameter 
is concerned. The two other line width parameters 
calculated by Duchamp, w2o and Mvgi, appear to be 
systematically too large over a wide range of signal-to- 
noise ratios and should not be used unless explicitly 
required in special, well-defined circumstances. 

Peak fluxes, as reported by Duchamp, are in gen- 
eral slightly too large for bright sources and signifi- 
cantly too large (on a relative scale) for faint sources. 
This is due to the fact that Duchamp determines the 
peak flux by simply selecting the value of the brightest 
pixel encountered. This method introduces a bias to- 
wards positive noise peaks sitting on top of the bright- 
est region of a source, and hence, in the presence of 
noise, peak fluxes measured by Duchamp will be sys- 
tematically too high. 

Integrated fluxes determined by Duchamp are sig- 
nificantly and systematically too small, in particular 
for faint sources. This is likely caused by the fact that 
Duchamp simply sums over the flux of discrete ele- 
ments above a given threshold to determine the in- 
tegrated flux, thereby missing some of the flux from 
elements below the flux threshold. Hence, the raw 
integrated flux measurements currently provided by 
Duchamp are not useful and need to be corrected to 
compensate for the systematic offset. This issue is par- 
ticularly sensitive as many scientiflc projects, including 
the ASKAP survey science projects WALLABY and 
DINGO^ (Meyer 2009), rely on accurate flux measure- 
ments, for example for determining the Hi mass func- 
tion of galaxies. 

Finally, a particular problem in the case of galax- 
ies is that under certain circumstances galaxies either 
get broken up into multiple detections or only one half 



^Deep Investigation of Neutral Gas Origins; princi- 
pal investigator: Martin Meyer; public website: http:// 
internal, physics, uwa.edu. au/ ~ mmeyer /dingo/ 
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Figure 21: Measured integrated flux, Fint, versus measured line width, W50, of all genuine (black) and 
false (red) detections made by DuCHAMP in the point source models with Gaussian line profiles (left) and 
the test cube with artificially redshifted WHISP galaxies (right). The dashed, black lines indicate the flux 
levels of 0.04 and 0.5 Jykms"^ used to filter false detections. 



of a galaxy is detected. This problem mainly affects 
faint, edge-on galaxies with broad spectral profiles that 
are partly hidden in the noise and results in systematic 
errors in the measurements of essentially all source pa- 
rameters, including basic parameters such as position 
and radial velocity. Such cases of multiple or partial 
detections must be identified and treated separately to 
prevent biases in any scientific analysis based on the 
source finding results. 

8 Summary 

In this paper we present and discuss the results of ba- 
sic, three-dimensional source finding tests with Du- 
CHAMP, the standard source finder for the Australian 
SKA Pathfinder, using different sets of unresolved and 
extended H I model sources as well as a data set of real 
galaxies and noise obtained from H I observations with 
the WSRT. 

Overall, Duchamp appears to be a successful, gen- 
eral-purpose source finder capable of reliably detect- 
ing sources down to low signal-to-noise ratios and ac- 
curately determining their position and velocity. In 
the case of point sources with simple Gaussian spec- 
tral lines we achieve a completeness of about 50% at a 
peak signal-to-noise ratio of 3 and an integrated fiux 
level of about 0.1 Jykms"^. The latter corresponds 
to an H I mass sensitivity of about 2 x 10* Mq at a 
distance of 100 Mpc which is slightly better than what 
the WALLABY project is expected to achieve for real 
galaxies (Koribalski & Staveley-Smith 2009). The sit- 
uation is less ideal for extended sources with double- 
horn profiles. In this case we achieve 50% completeness 
at an integrated flux level of about 2.5 Jykms"^ for 
the model galaxies and 0.7 Jykms^^ for the WHISP 
galaxies. The latter is equivalent to an Hi mass sensi- 
tivity of about 1.7 X 10® Mq at a distance of 100 Mpc, 
illustrating that the performance of Duchamp, as well 
as any other source finder, will strongly depend on 



source morphology. However, these figures may well 
be improved by carefully optimising the various input 
parameters offered by Duchamp. 

In its current state Duchamp is not particularly 
successful in parametrising sources in the presence of 
noise in the data cube, and other, external algorithms 
for source parametrisation should be considered in- 
stead. It appears, however, that most, if not all, para- 
metrisation issues are due to intrinsic limitations in the 
implemented algorithms themselves and not due to er- 
rors in their implementation, suggesting that most of 
the problems can in principle be solved by implement- 
ing more sophisticated parametrisation algorithms in 
Duchamp. Alternatively, corrections would have to be 
applied to all parameters derived by Duchamp to com- 
pensate for systematic errors. Such corrections, how- 
ever, would have to be highly specialised and tailored 
to the particular survey and source type concerned. 
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